Goto

Collaborating Authors

 substitute model





Adapt under Attack and Domain Shift: Unified Adversarial Meta-Learning and Domain Adaptation for Robust Automatic Modulation Classification

Owfi, Ali, Bamdad, Amirmohammad, Seyfi, Tolunay, Afghah, Fatemeh

arXiv.org Artificial Intelligence

Deep learning has emerged as a leading approach for Automatic Modulation Classification (AMC), demonstrating superior performance over traditional methods. However, vulnerability to adversarial attacks and susceptibility to data distribution shifts hinder their practical deployment in real-world, dynamic environments. To address these threats, we propose a novel, unified framework that integrates meta-learning with domain adaptation, making AMC systems resistant to both adversarial attacks and environmental changes. Our framework utilizes a two-phase strategy. First, in an offline phase, we employ a meta-learning approach to train the model on clean and adversarially perturbed samples from a single source domain. This method enables the model to generalize its defense, making it resistant to a combination of previously unseen attacks. Subsequently, in the online phase, we apply domain adaptation to align the model's features with a new target domain, allowing it to adapt without requiring substantial labeled data. As a result, our framework achieves a significant improvement in modulation classification accuracy against these combined threats, offering a critical solution to the deployment and operational challenges of modern AMC systems.




Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks

Nasr, Milad, Fratantonio, Yanick, Invernizzi, Luca, Albertini, Ange, Farah, Loua, Petit-Bianco, Alex, Terzis, Andreas, Thomas, Kurt, Bursztein, Elie, Carlini, Nicholas

arXiv.org Artificial Intelligence

As deep learning models become widely deployed as components within larger production systems, their individual shortcomings can create system-level vulnerabilities with real-world impact. This paper studies how adversarial attacks targeting an ML component can degrade or bypass an entire production-grade malware detection system, performing a case study analysis of Gmail's pipeline where file-type identification relies on a ML model. The malware detection pipeline in use by Gmail contains a machine learning model that routes each potential malware sample to a specialized malware classifier to improve accuracy and performance. This model, called Magika, has been open sourced. By designing adversarial examples that fool Magika, we can cause the production malware service to incorrectly route malware to an unsuitable malware detector thereby increasing our chance of evading detection. Specifically, by changing just 13 bytes of a malware sample, we can successfully evade Magika in 90% of cases and thereby allow us to send malware files over Gmail. We then turn our attention to defenses, and develop an approach to mitigate the severity of these types of attacks. For our defended production model, a highly resourced adversary requires 50 bytes to achieve just a 20% attack success rate. We implement this defense, and, thanks to a collaboration with Google engineers, it has already been deployed in production for the Gmail classifier.


Stabilizing Data-Free Model Extraction

Nguyen, Dat-Thinh, Le, Kim-Hung, Le-Khac, Nhien-An

arXiv.org Artificial Intelligence

Model extraction is a severe threat to Machine Learning-as-a-Service systems, especially through data-free approaches, where dishonest users can replicate the functionality of a black-box target model without access to realistic data. Despite recent advancements, existing data-free model extraction methods suffer from the oscillating accuracy of the substitute model. This oscillation, which could be attributed to the constant shift in the generated data distribution during the attack, makes the attack impractical since the optimal substitute model cannot be determined without access to the target model's in-distribution data. Hence, we propose MetaDFME, a novel data-free model extraction method that employs meta-learning in the generator training to reduce the distribution shift, aiming to mitigate the substitute model's accuracy oscillation. In detail, we train our generator to iteratively capture the meta-representations of the synthetic data during the attack. These meta-representations can be adapted with a few steps to produce data that facilitates the substitute model to learn from the target model while reducing the effect of distribution shifts. Our experiments on popular baseline image datasets, MNIST, SVHN, CIFAR-10, and CIFAR-100, demonstrate that MetaDFME outperforms the current state-of-the-art data-free model extraction method while exhibiting a more stable substitute model's accuracy during the attack.


Generating Transferrable Adversarial Examples via Local Mixing and Logits Optimization for Remote Sensing Object Recognition

Liu, Chun, Wang, Hailong, Zhu, Bingqian, Ding, Panpan, Zheng, Zheng, Xu, Tao, Han, Zhigang, Wang, Jiayao

arXiv.org Artificial Intelligence

Deep Neural Networks (DNNs) are vulnerable to adversarial attacks, posing significant security threats to their deployment in remote sensing applications. Research on adversarial attacks not only reveals model vulnerabilities but also provides critical insights for enhancing robustness. Although current mixing-based strategies have been proposed to increase the transferability of adversarial examples, they either perform global blending or directly exchange a region in the images, which may destroy global semantic features and mislead the optimization of adversarial examples. Furthermore, their reliance on cross-entropy loss for perturbation optimization leads to gradient diminishing during iterative updates, compromising adversarial example quality. To address these limitations, we focus on non-targeted attacks and propose a novel framework via local mixing and logits optimization. First, we present a local mixing strategy to generate diverse yet semantically consistent inputs. Different from MixUp, which globally blends two images, and MixCut, which stitches images together, our method merely blends local regions to preserve global semantic information. Second, we adapt the logit loss from targeted attacks to non-targeted scenarios, mitigating the gradient vanishing problem of cross-entropy loss. Third, a perturbation smoothing loss is applied to suppress high-frequency noise and enhance transferability. Extensive experiments on FGSCR-42 and MTARSI datasets demonstrate superior performance over 12 state-of-the-art methods across 6 surrogate models. Notably, with ResNet as the surrogate on MTARSI, our method achieves a 17.28% average improvement in black-box attack success rate.


I Stolenly Swear That I Am Up to (No) Good: Design and Evaluation of Model Stealing Attacks

Oliynyk, Daryna, Mayer, Rudolf, Grosse, Kathrin, Rauber, Andreas

arXiv.org Artificial Intelligence

Model stealing attacks endanger the confidentiality of machine learning models offered as a service. Although these models are kept secret, a malicious party can query a model to label data samples and train their own substitute model, violating intellectual property. While novel attacks in the field are continually being published, their design and evaluations are not standardised, making it challenging to compare prior works and assess progress in the field. This paper is the first to address this gap by providing recommendations for designing and evaluating model stealing attacks. To this end, we study the largest group of attacks that rely on training a substitute model -- those attacking image classification models. We propose the first comprehensive threat model and develop a framework for attack comparison. Further, we analyse attack setups from related works to understand which tasks and models have been studied the most. Based on our findings, we present best practices for attack development before, during, and beyond experiments and derive an extensive list of open research questions regarding the evaluation of model stealing attacks. Our findings and recommendations also transfer to other problem domains, hence establishing the first generic evaluation methodology for model stealing attacks.